Skip to content

Extract embedded captions while processing file #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 27, 2025
Merged

Conversation

masaball
Copy link
Contributor

Video files can have captions/subtitles tracks embedded in them. We can facilitate the use of these tracks by implementers by extracting them during file transcoding/processing and writing them out as additional, separate outputs.

@masaball masaball requested a review from cjcolvar December 11, 2024 19:35
Copy link
Member

@cjcolvar cjcolvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this over, I'm wondering about changing up the metadata in a few ways. Let me know what you think. I'm happy to talk over anything especially if this isn't clear.

  • Input tech metadata: change subtitle_count into subtitles which is an array of hashes that contain language, label, and format (e.g. srt, vtt, tx3g)
  • Output tech metadata: same as input tech metadata (output video derivatives can then report if they have subtitles embedded)
  • Fold SubtitleTechnicalMetadata into TechnicalMetadata by adding format and language
  • Subtitle output files can set format and language using technical metadata and label using ActiveEncode::Output.
  • I think the subtitle output file format can always be set to vtt and doesn't need to be read from the input metadata.

@masaball masaball requested a review from cjcolvar January 27, 2025 17:43
Copy link
Member

@cjcolvar cjcolvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@masaball masaball merged commit adecfb1 into main Jan 27, 2025
7 checks passed
@masaball masaball deleted the caption_extraction branch January 27, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants